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Abstract 

We present an uncertainty relation for the representation of signals in two different general (possibly 
redundant or incomplete) signal sets. This uncertainty relation is relevant for the analysis of signals 
containing two distinct features each of which can be described sparsely in a suitable general signal set. 
Furthermore, the new uncertainty relation is shown to lead to improved sparsity thresholds for recovery 
of signals that are sparse in general dictionaries. Specifically, our results improve on the well-known 
(1+1 /d)/ 2-threshold for dictionaries with coherence d by up to a factor of two. Furthermore, we provide 
probabilistic recovery guarantees for pairs of general dictionaries that also allow us to understand which 
parts of a general dictionary one needs to randomize over to "weed out" the sparsity patterns that prohibit 
breaking the square-root bottleneck. 

I. Introduction and Outline 

A milestone in the sparse signal recovery literature is the uncertainty relation for the Fourier-identity 
pair found in JTJ. This uncertainty relation was extended to pairs of arbitrary orthonormal bases (ONBs) 
in (2}. Besides being interesting in their own right, these uncertainty relations are fundamental in the 
formulation of recovery guarantees for signals that contain two distinct features, each of which can be 
described sparsely using an ONB. If the individual features are, however, sparse only in overcomplete 
signal sets (i.e., in frames (3j), the two-ONB result |TJ, (2| cannot be applied. The goal of this paper is 
to find uncertainty relations and corresponding signal recovery guarantees for signals that are sparse in 
pairs of general (possibly redundant) signal sets. Redundancy in the individual signal sets allows us to 
succinctly describe a wider class of features. Concrete examples for this setup can be found in the feature 
extraction or morphological component analysis literature (see, e.g., EJ, J5J and references therein). 
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In order to put our results into perspective and to detail our contributions, we first briefly recapitulate 
the formal setup considered in the sparse signal recovery literature (6j, J7J, Q, |[8|-pT|. 

A. Sparse Signal Recovery Methods 

Consider the problem of recovering unknown vectors from small numbers of linear non-adaptive 
measurements. More formally, let x G be an unknown vector that is observed through a measurement 
matrix D with column^] dj G C AI , i = 1, 2, . . . , N, according to 

y = Dx 

where y G C M and M <C N. If we do not impose additional assumptions on x, the problem of recovering 
x from y is obviously ill-posed. The situation changes drastically if we assume that x is sparse in the 
sense of having only a few nonzero entries. More specifically, let ||x|| denote the number of nonzero 
entries of x, then 

(PO) minimize ||x|| subject to y = Dx 

can recover x without prior knowledge of the positions of the nonzero entries of x. Equivalently, we 
can interpret (PO) as the problem of finding the sparsest representation of the vector y in terms of the 
"dictionary elements" (columns) dj. In this context, the matrix D is often referred to as dictionary. 



Since (PO) is an NP-hard problem |12| (it requires a combinatorial search), it is computationally 
infeasible, even for moderate problem sizes N, M. Two popular and computationally more tractable 
alternatives to solving (PO) are basis pursuit (BP) jT3j, (6[-j8}, (2j, [9] and orthogonal matching pursuit 



(OMP) (14}, |15]], (9}. BP is a convex relaxation of the (PO) problem, namely 

(BP) minimize ||x||-^ subject to y — Dx. 

Here, = denotes the ^i-norm of the vector x. OMP is an iterative greedy algorithm that 

constructs a sparse representation of y by selecting, in each iteration, the column of D most "correlated" 
with the difference between y and its current approximation. 

Two questions that arise naturally are: 1) Under which conditions is x the unique solution of (PO)? 
2) Under which conditions is this solution delivered by BP and/or OMP? Answers to these questions are 
typically expressed in terms of sparsity thresholds on the unknown vector x |6}-|[8}, J2}, J9}. These 
sparsity thresholds either hold for all possible sparsity patterns and values of nonzero entries in x, 
in which case we speak of deterministic sparsity thresholds. Alternatively, one may be interested in 
so-called probabilistic or — following the terminology used in [10] — robust sparsity thresholds, which 

1 Throughout the paper, we shall assume that the columns of D span C M and have unit ^2-norm. 
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hold for most sparsity patterns and values of nonzero entries in x. Intuitively, robust sparsity thresholds 
are larger than deterministic ones. More precisely, as the number of measurements M grows large, 
deterministic sparsity thresholds generally scale at best as y/M. Robust sparsity thresholds, in contrast, 
break this square-root bottleneck. In particular, they scale on the order of M/(logN) (TTJ. However, 
this comes at a price: Uniqueness of the solution oi^(PO) and recoverability of the (PO)-solution through 
BP is guaranteed only with high probability with respect to the choice ol^]x. 

Both deterministic and probabilistic sparsity thresholds are typically expressed in terms of the dictio- 
nary coherence, defined as the maximum absolute value over all inner products between pairs of distinct 
columns of D. 

An alternative approach is to assume that the dictionary D is random (rather than the vector x) and 
to determine thresholds that hold for all (sufficiently) sparse x with high probability with respect to the 



choice of D [ 17 1-| 19]. Throughout this paper, we consider deterministic dictionaries exclusively. 

Note that when considering signals that consist of two distinct features, each of which can be described 
sparsely using an ONB Q, |6j, |20|, (9j, the corresponding dictionary D is given by the concatenation 
of these two ONBs. One obvious way of obtaining recovery guarantees for signals that are sparse in 
pairs of general signal sets is to concatenate these general signal sets, view the concatenation as one 
(general) dictionary, and apply the sparsity thresholds for general dictionaries reported in, e.g., Q-||9j, 



1 1 1 1 . However, these sparsity thresholds depend only on the coherence of the resulting overall dictionary 
D and, in particular, do not take into account the coherence parameters of the two constituent signal sets. 

In this paper, we show that the sparsity thresholds can be significantly improved not only if D is the 
concatenation of two ONBs — as was done in (2J, (8j, (20|, (9j — but also if D consists of the concate- 
nation of two general signal sets (or sub-dictionaries) with known coherence parameters. 



B. Contributions 

Our contributions can be detailed as follows. Based on a novel uncertainty relation for pairs of general 
(redundant or incomplete) signal sets, we obtain a novel deterministic sparsity threshold guaranteeing 
(PO)-uniqueness for dictionaries that are given by the concatenation of two general sub-dictionaries 
with known coherence parameters. Additionally, we derive a novel threshold guaranteeing that BP and 
OMP recover this unique (PO)-solution. Our thresholds improve significantly on the known deterministic 
sparsity thresholds one would obtain if the concatenation of two sub-dictionaries were viewed as a 
general dictionary, thereby ignoring the additional information about the sub-dictionaries' coherence 

2 Whenever we speak of uniqueness of the solution of (PO), we mean that the unique solution of (PO) applied to y = Dx is 
given by x. 

3 Robust sparsity thresholds for OMP to deliver the unique (PO)-solution are still unknown. For the multichannel scenario, first 
results along these lines were reported in 1 16|, where it is shown that the probability of reconstruction error decays exponentially 
with the number of channels. 
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parameters. More precisely, this improvement can be up to a factor of two. Moreover, the known sparsity 
thresholds for general dictionaries and the ones for the concatenation of two ONBs follow from our 
results for the concatenation of general sub-dictionaries as special cases. 

Concerning probabilistic sparsity thresholds for the concatenation of two general dictionaries, we 
address the following question: Given a general dictionary, can we break the square -root bottleneck 
while only randomizing the sparsity patterns over a certain part of the overall dictionary? By extending 



the known results for the two-ONB setting 1 10 1, [ 1 1 1 to the concatenation of two general dictionaries, we 



show that the answer is in the affirmative. Our results allow us to identify parts of a general dictionary 
the sparsity patterns need to be randomized over so as to break the square-root bottleneck. 



C. Notation 

We use lowercase boldface letters for column vectors, e.g., x, and uppercase boldface letters for 
matrices, e.g., D. For a given matrix D, we denote its ith column by dj, its conjugate transpose by D^, 
and its Moore-Penrose inverse by D^. Slightly abusing notation, we say that d G D if d is a column of 
the matrix D. The spectral norm of a matrix D is ||D|| = y/ A max (D^D), where A max (D^D) denotes 
the maximum eigenvalue of D^D. The minimum and maximum singular value of D are denoted by 
C min (D) and c max (D), respectively; rank(D) stands for the rank of D, ((D^ 2 = maxj{||dj|| 2 }, and 
|| D || j 1 = maxjjUdil^}. The smallest eigenvalue of the positive-semidefinite matrix G is denoted by 
Amin(G). We use I n to refer to the nxn identity matrix; mi „ and l mn stand for the all-zero and all-one 
matrix of size mxn, respectively. We denote the n-dimensional all-ones and all-zeros column vector by 
l n and n , respectively. The natural logarithm is referred to as log. The set of all positive integers is N + . 
For two functions f(x) and g(x), the notation f{x) = Q(g(x)) means that there exists a real number xq 
such that \f(x)\ > ki\g(x)\ for all x > xo, where k\ is a finite constant. The notation f(x) = 0{g{x)) 
means that there exists a real number xq such that |/(x)| < k2\g(x)\ for all x > xq, where k2 is a finite 
constant. Furthermore, we write f(x) = Q(g(x)) if there exists a real number xo and finite constants ki 
and k2 such that k\\g(x)\ < \f(x)\ < k2\g(x) \ for all x > xq. For u G R, we define [u] + = max{0, u}. 
Whenever we say that a vector x G has a randomly chosen sparsity pattern of cardinality L, we 
mean that the support set of x (i.e., the set of nonzero entries of x) is chosen uniformly at random among 
all (^) possible support sets of cardinality L. 

II. Deterministic Sparsity Thresholds 
A. A Brief Review of Relevant Previous Work 

A quantity that is intimately related to the uniqueness of the solution of (P0) is the spark of a dictionary 

D, defined as the smallest number of linearly dependent columns of D [7]. More specifically, the 
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following result holds |7J , ||8j : For a given dictionary D and measurement outcome y = Dx, the unique 
solution of (PO) is given by x if 

No<^5). (1) 

Unfortunately, determining the spark of a dictionary is an NP-hard problem, i.e., a problem that is as hard 
as solving (PO) directly. It is possible, though, to derive easy-to-compute lower bounds on spark(D) that 
are explicit in the coherence of D defined as 

d = maxldj cL/l . (2) 

We next briefly review these lower bounds. Let us first consider the case where D is the concatenation of 
two ONBs. Denote the set of all dictionaries that are the concatenation of two ONBs and have coherence 
d by V OB h(d). It was shown in Q that for D G V on ^(d), we have 

2 

spark(D) > -. (3) 
d 

Substituting ([3]) into ([Tj) yields the following sparsity threshold guaranteeing that the unique solution of 
(PO) applied to y = Dx is given by x: 



1 

d' 



Hlo < ( 4 ) 



Furthermore, it was shown in (2J, |20 1, (9j that for this unique solution to be recovered by BP and OMP 
it is sufficient to have 



V2 - 0.5 _ 0.9 

~d ~ T 



l x llo < ~ — • ( 5 ) 



A question that arises naturally is: What happens if the dictionary D is not the concatenation of two 
ONBs? There exist sparsity thresholds in terms of d for general dictionaries. Specifically, let us denote 
the set of all dictionaries with coherence d by V gtn {d). It was shown in |7j-|[9j that for D G T> gtri {d) we 
have 

spark(D) > 1 + -. (6) 

Using ([6]) in ^ yields the following sparsity threshold guaranteeing that the unique solution of (PO) 
applied to y = Dx is given by x: 

\( 1 



W.<^l + 5> ). (7) 
Interestingly, one can show that Q also guarantees that BP and OMP recover the unique (PO)-solution (7J- 

The set V gen (d) is large, in general, and contains a variety of structurally very different dictionaries, 
ranging from equiangular tight frames (where the absolute values of the inner products between any two 
distinct dictionary elements are equal) to dictionaries where the maximum inner product is achieved by 
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one pair only. The sparsity threshold in ^ is therefore inevitably rather crude. Better sparsity thresholds 
are possible if one considers subsets of V gen (d), such as, e.g., V on b(d) C V gen (d). A dictionary D G 
V on b(d) also satisfies D G V gen (d), and, hence, the sparsity threshold in ([7]) applies. However, the 
additional structural information about D being the concatenation of two ONBs, i.e., D G V oa ^{d), 
allows us to obtain the improved sparsity thresholds in Q and Q, which are (for (i « 1) almost a 
factor of two higher (better) than the threshold in Q. As a side remark, we note that the threshold for 
the two-ONB case in ([5]) drops below that in ([T]), valid for general dictionaries, if d > 2(\/2 — 1). This is 
surprising as exploiting structural information should lead to a higher sparsity threshold. We will show, 
in Section [Tl-B| that one can refine the threshold in ([5]) so as to fix this problem. 

B. Novel Deterministic Sparsity Thresholds for the Concatenation of Two General Signal Sets 

We consider dictionaries with coherence d that consist of two sub-dictionaries with coherence a and b, 
respectively. The set of all such dictionaries will be denoted as V(d, a, b). A dictionary D G T>(d, a, b) of 
dimension M x N (with N > M) can be written as D = [A B], where the sub-dictionary A G C MxNa 
has coherence a and the sub-dictionary B G <C MxNb has coherence b. We remark that the two sub- 
dictionaries need not be ONBs, need not have the same number of elements and need not span C M , but 
their concatenation is assumed to span C . Without loss of generality, we assume, throughout the paper, 
that a < b. For fixed dQwe have that V(d, a, b) C T> gen (d). Hence, we consider subsets V(d, a, b) of 
the set V gta (d) parametrized by the coherence parameters a and b. 

For D G T)(d, a, b) we derive sparsity thresholds in terms of d, a, and b and show that these thresholds 
improve upon those in Q for general dictionaries D G TJ gen (d). This improvement is a result of the 
restriction to a subset of dictionaries in V gen (d), namely V(d,a,b), and of exploiting the additional 
structural information (in terms of the coherence parameters a and b) available about dictionaries D in 
this subset. 

Every dictionary in V gen (d) can be viewed as the concatenation of two sub-dictionaries. Our results 
therefore state that viewing a dictionary D G V gen (d) as the concatenation of two sub-dictionaries leads 
to improved sparsity thresholds provided that the coherence parameters a and b of the respective sub- 
dictionaries are available. Moreover, the improvements will be seen to be up to a factor of two if a and 
b are sufficiently small. 

The sparsity threshold for uniqueness of the solution of (PO) for dictionaries D G T>(d, a, b), formal- 
ized in Theorem[2]below, is based on a novel uncertainty relation for pairs of general dictionaries, stated 
in the following lemma. 

4 We assume throughout the paper that d > 0. For d = the dictionary D consists of orthonormal columns, and, hence, 
every unknown vector x can be uniquely recovered from the measurement outcome y according to x = T) H y. 



January 11, 2013 



DRAFT 



7 



Lemma 1: Let A G C MxNa be a dictionary with coherence a, B G C MxNb a dictionary with 
coherence 6, and denote the coherence of the concatenated dictionary D = [A B], D G C MxN , by 
d. For every vector s G C that can be represented as a linear combination of n a columns of A and, 
equivalently, as a linear combination of n b columns of B^Jthe following inequality holds: 

[1 - a{n a - 1)] + [1 - b(n b - 1)]+ 
n a n b > ^ • (°) 

Proof: See Appendix [A] ■ 
The uncertainty relation for the union of two-ONB case derived in (2j is a special case of ([8]). In particular, 
if a = b = 0, then ((8) reduces to the result reported in [2, Thm. 1]: 

n a n b > Jj. (9) 

Note that, differently from |2, Thm. 1], the lower bound in (|9]) holds not only for the concatenation of 
two ONBs, but also for the concatenation of two sub-dictionaries A and B that contain orthonormal 
columns but individually do not necessarily span C M (but their concatenation spans C AI ). Lemma [l] 
allows us to easily recover several other well-known results such as, e.g., the well-known lower bound 
in ([6]) on the spark of a dictionary. To see this note that when n b = in Lemma [T] (and thus s = 0m> 
by definition) then the n a columns in A participating in the representation of s are linearly dependent. 
Moreover, for n b = we have [1 — b(n b — 1)] + = (1 + b) > 0. Therefore, it follows from ([8]) that 
necessarily [1 — a(n a — 1)] + = and thus n a > 1 + 1/a, which agrees with the lower bound on the 
spark of the (sub-)dictionary A [J7J-J9]]. A similar observation follows for n a = 0. 

More importantly, Lemma [T] also allows us to derive a new lower bound on the spark of the overall 
dictionary D = [A B] G T>(d, a, b). When used in ([T]), this result then yields a new sparsity threshold 
guaranteeing uniqueness of the (PO)-solution. We show that this threshold improves upon that in ([7]), 
which would be obtained if we viewed D simply as a general dictionary in V gen (d), thereby ignoring 
the fact that the dictionary under consideration belongs to a subset, namely V(d, a, b), of V gen (d). 

Theorem 2: For D G V(d, a, b), a sufficient condition for the vector x to be the unique solution of 
(P0) applied to y = Dx is that 

M. < ^ do) 

where 

(l + o)(l + 6)-a;6(l + o) 



x{d 2 - ab) + o(l + b) 
and x = m.in{x b , x s }. Furthermore, 

l + b 

Xb ~ b + d? 



5 For n a — or rib = we define s = Oj/. We exclude the trivial case n a — rib = 0. 
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and 

1 



if a = b = d, 
d 



dJ(l + a)(l + b) - a- ah 

, otherwise. 



d 2 -ab 

Proof: See Appendix [B] ■ 
The sparsity threshold in ( fTO] ) reduces to that in ([T]) when b = d (irrespective of a) or when d = 1 
(irrespective of a and b). Hence, the sparsity threshold in ( lOi does not improve upon that in ([7]) if the 



pair of columns achieving the overall dictionary coherence d appears in the same sub-dictionary B (recall 
that we assumed b > a), or if d = 1. In all other cases, the sparsity threshold in ( fTO] ) can be shown to 
be strictly larger than that in ([7]). This result is proven in Appendix |C| The improvement can be up to a 



factor of two. We demonstrate this for the special case a = b, for which the sparsity threshold in ( |T0| ) 



takes a particularly simple form. In this case, as can easily be verified, x s < x b so that (jTOj) reduces to 



For a = b = the sparsity threshold in ( 1 1 ) reduces to the known sparsity threshold for dictionaries in 



^onb(d) specified in ([4]). Note, however, that the threshold in ([11} with 6 = holds for all D G £>(0, 0, d), 



thereby also including sub-dictionaries A and B that contain orthonormal columns but do not necessarily 
individually span C (but their concatenation spans C AI ). Setting b = ed with e G [0, 1] and noting that 
for d <C 1 the ratio between the sparsity threshold in ( |TT] > and that in ((TJ is roughly 2/(1 + e), which 
for e <C 1 is almost two. Note that, for small coherence parameters a and b, the elements in each of 
the two sub-dictionaries A and B are close to being orthogonal to each other. Fig. [T] shows the sparsity 



threshold in ( fTT| ) for d = 0.01 as a function of b. We can see that for b -C d the threshold in ( [TTj ) is, 
indeed, almost a factor of two larger than that in ([7]). 

So far, we focused on thresholds guaranteeing (PO)-uniqueness. We next present thresholds guar- 
anteeing recovery of the unique (PO)-solution via BP and OMP for dictionaries D G V(d,a,b). The 
recovery conditions we report in Theorem [3] and Corollary [4] below, depend on b and d, but not on a. 
Slightly improved thresholds that also depend on a can be derived following similar ideas as in the proofs 
of Theorem [3] and Corollary [4] The resulting expressions are, however, unwieldy and will therefore not 
be presented here. 

Theorem 3: Suppose that y G C M can be represented as y = Dx, where x has n a nonzero entries 
corresponding to columns of A and re& nonzero entries corresponding to columns of B. Without loss of 
generality, we assume that n a < rib- A sufficient condition for BP and OMP to recover x is 

2n a {l + b)b + n b (l + b)(d + b) + 2n a n b (d 2 - b 2 ) < (1 + b) 2 . (12) 
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(P0) threshold for V onh (d), [Eq. (4)] 

BP/OMP threshold for V onh {d), [Eq. (5)] 

BP/OMP threshold for V{d,a = b, b), [Eq. (13)] 
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Figure 1 . Deterministic sparsity thresholds guaranteeing uniqueness of (P0) and recoverability via BP and OMP for dictionaries 
in Dgen(d), T> on b(d), and T>(d, a, b). We set d — 0.01 and consider the special case a = b. Note that for a = b, the threshold 
in < | 1 0| > reduces to that in ([TTJ. 



Proof: See Appendix [D] ■ 
Theorem [3] generalizes the result in [2, Sec. 6], [9, Cor. 3.8] for the concatenation of two ONBs to 
dictionaries D £ V(d,a,b). In particular, ([12]) reduces to |[9j Eq. (16)] when b = (since a < b, 
this implies a = 0). Furthermore, when b = d, the condition in ( TO] ) simplifies to n a + n;, < (1 + 
l/d)/2, thereby recovering the sparsity threshold in ([7]). Thus, if the pair of columns achieving the overall 
dictionary coherence is in the same sub-dictionary B (recall that we assumed b > a), no improvement 
over the well-known (1 + 1/d) /2-threshold for dictionaries in V gta (d) is obtained. Theorem [5] depends 
explicitly on n a and Uf,. In the following corollary, we provide a recovery guarantee in the form of a 
sparsity threshold that depends on n a and only through the overall sparsity level of x according to 
ll x llo = n a + rib. 

Corollary 4: For D G T>(d, a, b) a sufficient condition for BP and OMP to deliver the unique solution 
of (P0) is 

f (l + fr)[g-(d + 36)] 

—^-5 — -T- -, if 6 < d and refd, 6) > 1, 

2(ar — o z J 

(13) 

1 + 2d 2 + 36 - d(l + 6) 



x|| < < 



2{d? + b) 



otherwise 
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with 



and £ = 2V2^Jd(b + d). 

Proof: See Appendix [E] ■ 
The sparsity threshold in (J_3) reduces to the sparsity threshold in ([7]) when b = d or when d = 1 



(irrespective of b). In all other cases, the sparsity threshold in ( fT3j ) is strictly larger than that in ([7]) 
(see Appendix [F]). The threshold in §13\ is complicated as we have to deal with two different cases. The 



distinction between these two cases is, however, crucial to ensure that the threshold in ( |T3j ) does not fall 
below that in (|7])|^]lt turns out that the first case in ( fT3j ) is active whenever b < d < 3/5, which covers 
essentially all practically relevant cases. In fact, for dictionaries with coherence d > 3/5, the sparsity 
threshold in ( [T3] ) allows for at most one nonzero entry in x. 

The improvement of the sparsity threshold in ( p"3] ) over that in Q can be up to a factor of almost two. 
This can be seen by setting b = ed with e G [0, 1) and noting that for d <C 1 the ratio between the 
sparsity threshold in the first case in {13]> and that in ((Tj is roughly (2-^/2(1 + e) - (1 + 3e)) /(I - e 2 ), 
which for e <C 1 is approximately 1.8. Fig.[T]shows the threshold in ( fT3"l ) for d = 0.01 as a function of b. 
We can see that for b -C d the threshold in ( fT3] l is, indeed, almost a factor of two larger than that in ([7]). 



If D is the concatenation of two ONBs, and hence a = b = 0, the sparsity threshold in ( 13 1 reduces to 



fV2-0.5 ... 1 

—5—' lfd< 71' 

l|x|| < < (15) 
v 2d 2 ' otnerwise - 

For d < 1/ \/2, this threshold is the same as that in (|5]) but improves on ([5]) if d > 1/ \/2- In particular, 
unlike the threshold in ((5]), the threshold in ( p3] > is guaranteed to be at least as large as that in Q. 



III. Robust Sparsity Thresholds 

The deterministic sparsity thresholds for dictionaries in V(d, a, b) derived in the previous section (as 
those available in the literature for dictionaries in T> on \,{d) and V gen (d)) all suffer from the so-called 
square -root bottleneck [11]. Specifically, from the Welch lower bound on coherence (21] 



d > 



N -M 



M(N — 1) 

we can conclude that, for N 3> M, the deterministic sparsity thresholds reported in this paper scale as 
y/M as M grows large. Put differently, for a fixed number of nonzero entries S in x, i.e., for a fixed 
sparsity level, the number of measurements M required to recover x through (P0), BP, or OMP is on 

6 Recall that for d > 2(^/2 - 1) the threshold in {5} drops below that in |7](. 
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the order of S 2 . The square -root bottleneck stems from the fact that deterministic sparsity thresholds are 
universal thresholds in the sense of applying to all possible sparsity patterns (of cardinality S) and values 
of the corresponding nonzero entries of x. As already mentioned in Section [IJ the probabilistic (i.e., 
robust) sparsity thresholds scale fundamentally better, namely according to Mj log N, which implies 
that the number of measurements required to recover x is on the order of S log N instead of S 2 . 

We next address the following question: Given a general dictionary, can we break the square-root 
bottleneck by only randomizing the sparsity patterns over a certain part of the overall dictionary? The 
answer turns out to be in the affirmative. It was shown in fT7] ], fTT| — for the concatenation of two 
ONBs — that randomization of the sparsity patterns is only required over one of the two ONBs. Before 
stating our results for general dictionaries let us briefly summarize the known results for concatenations 
of ONBs. 



A. A Brief Review of Relevant Previous Work 

Robust sparsity thresholds for the concatenation of two ONBs were first reported in [10] (based on 
earlier work in [17Q and later improved in (TTJ. In Theorem [5] below, we restate a result from fTT] 
(obtained by combining Theorems D, 13, and 14) in a slightly modified form better suited to draw 
parallels to the case of dictionaries in V(d, a, b) considered in this paper. 

Theorem 5 (Tropp, 2008): Assume tha^N > 2. Let D G C MxN be the union of two ONBs for 
C M given by A and B (i.e., N = 2M) and denote the coherence of D as d. Fix s > 1. Let the vector 
x G have an arbitrarily chosen sparsity pattern of n a nonzero entries corresponding to columns of 
sub-dictionary A and a randomly chosen sparsity pattern of re& nonzero entries corresponding to columns 
of sub-dictionary B. Suppose that 

f cd~ 2 d~ 2 ) 

n a + n b <mm<— 77, ^— f (16) 



slogN' 2 

where c = 0.004212. If the entries of x restricted to the chosen sparsity pattern are jointly continuous 
random variables]^] then the unique solution of (P0) applied to y = Dx is given by x with probability 
exceeding (1 — N~ s ). 

If the total number of nonzero entries satisfies 

. f cd~ 2 d~ 2 d~ 2 } 

n n + rih < mm< , , — — > (17) 

\slogAT' 2 ' 8(s + l)logATj v ' 

7 In jl 1 1 it is assumed that M > 3 (and hence TV > 6). However, it can be shown that N > 2 is sufficient to establish the 
result. 

8 For a definition of joint continuity, we refer to J22I pp. 40]. 
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and the entries of x restricted to the chosen sparsity pattern are jointly continuous random variables with 
i.i.d. phases that are uniformly distributed in [0, 2tt) (the magnitudes need not be i.i.d.), then the unique 
solution of both (P0) and BP applied to y = Dx is given by x with probability exceeding (1 — 3N~ S ). 

An important consequence of Theorem[5]is the following: For the concatenation of two ONBs a robust 
sparsity threshold S = n a +n{, of order Mj (log N) is possible if the coherence d of the overall dictionary 
is on the order of 1/ \[M. Note that for the same coherence d, deterministic sparsity thresholds would 
suffer from the square -root bottleneck as discussed in fTT] . Remarkably, Theorem [5] does not require 
that the positions of all nonzero entries of x are chosen randomly: It suffices to pick the positions of 
the nonzero entries of x corresponding to one of the two ONBs at random, while the positions of the 
remaining nonzero entries — all corresponding to columns in the other ONB — can be chosen arbitrarily. 
This essentially means that the result is universal with respect to one of the two ONBs (A by choice of 
notation here) in the sense that all possible combinations of n a columns in A are allowed. Randomization 
over the other ONB ensures that the overall sparsity patterns that cannot be recovered (with on the order 
of S log N measurements) are "weeded out". Moreover, randomization is needed on the values of all 
nonzero entries of x, which reflects the fact that there exist certain value assignments on a given sparsity 
pattern that cannot be recovered with on the order of S log N measurements. In summary, Theorem [5] 
states that every sparsity pattern in A in conjunction with most sparsity patterns in B and most value 
assignments on the resulting overall sparsity pattern can be recovered. 

This result is interesting as it hints at the possibility of isolating specific parts of the dictionary D 
that require randomization to "weed out" the support sets that are not recoverable. Unfortunately, the 
two-ONB structure is too restrictive to bring out this aspect. Specifically, as the two ONBs are on equal 
footing, the result in Theorem [5] does not allow us to understand which properties of a sub-dictionary 
are responsible for problematic sparsity patterns. This motivates looking at robust sparsity thresholds for 
the concatenation of two general dictionaries. Now, we could interpret the concatenation of two general 
(sub-)dictionaries as a general dictionary in V gen (d) and apply the robust sparsity thresholds for general 
dictionaries reported in [11]. This requires, however, randomization over the entire dictionary (i.e., the 
positions of all nonzero entries of x have to be chosen at random and the values as well). Hence, the robust 
sparsity threshold for general dictionaries does not allow us to isolate specific parts of the dictionary D 
that require randomization to "weed out" the support sets that are not recoverable with on the order of 
S log N measurements. 

B. Robust Sparsity Thresholds for the Concatenation of General Signal Sets 

We next derive robust sparsity thresholds for dictionaries D 6 V(d,a,b). Our results not only gener- 
alize Theorem B] to the concatenation of two general dictionaries but, since every dictionary in V gen (d) 
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can be viewed as the concatenation of two sub-dictionaries, also allow us to understand which part of a 
general dictionary requires randomization to "weed out" the support sets that are not recoverable. 

Theorem 6: Assume that N > 2. Let D = [A B] be a dictionary in T>(d,a,b). Fix s > 1 and 
7 G [0, 1]. Consider a random vector x = [x„ x^] T where x a G C Na has an arbitrarily chosen 
sparsity pattern of cardinality n a such that 

6\/2 \/n a d 2 s log N + 2(n a - l)a < (1 - 7)e _1/4 (18) 

and Xfe G C^ b has a randomly chosen sparsity pattern of cardinality raj, such that 



24Vn 6 6 2 slog^+^||B|| 2 + 2. f^\\ A||||B|| < 7c- 1 / 4 . 



(19) 



If the total number of nonzero entries of x satisfies 



dr 2 

n a + n b <— (20) 

and the entries of x restricted to the chosen sparsity pattern are jointly continuous random variables, 
then the unique solution of (P0) applied to y = Dx is given by x with probability exceeding (1 — N~ s ). 
If the total number of nonzero entries of x satisfies 

n a + nh < mm < , — — > (21) 

6 \ 2 ' 8(s + l)logiVJ v ' 

and the entries of x restricted to the chosen sparsity pattern are jointly continuous random variables with 
i.i.d. phases that are uniformly distributed in [0, 2tt) (the magnitudes need not be i.i.d.), then the unique 
solution of both (P0) and BP applied to y = Dx is given by x with probability exceeding (1 — 3iV~ s ). 

Proof: The proof is based on the following lemma proven in Appendix |G) 
Lemma 7: Fix s > 1 and 7 G [0, 1]. Let S be a sub-dictionary of D = [A B] G T)(d, o, b) containing 
n a arbitrarily chosen columns of A and rtf, randomly chosen columns of B. If n a and satisfy ( fT8j ) 
and ( p~9] >, respectively, then the minimum singular value cr m i n (S) of the sub-dictionary S obeys 

1 



^ mi n(S)<^=|<iV- S . 

The proof of Theorem [6] then follows from Lemma [7] and the results in |TT| as follows. The sparsity 
pattern of x obtained according to the conditions in Theorem [6] induces a sub-dictionary S of D con- 
taining n a arbitrarily chosen columns of A and n& randomly chosen columns of B. As a consequence 

'Since we will be interested in the individual scaling behavior of n a and rib as M grows large, we shall assume in the 
remainder of the paper that n a ,nt > 1. 
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of Lemma [7] the smallest singular value of this sub-dictionary exceeds 1 / y/2 with probability at least 

(1 - N- s ). 

Lemma[7]together with condition (20 1 and the requirement that the entries of x restricted to the chosen 



sparsity pattern are jointly continuous random variables implies, as a consequence of [ 1 1 , Thm. 13] (see 
also Appendix |H| where JTTJ Thm. 13] is restated for completeness), that the unique solution of (P0) 
applied to y = Dx is given by x with probability at least (1 — N~ s ). 

The second statement in Theorem [6] is proven as follows. Lemma [7] together with condition (21 



and the requirement that the entries of x restricted to the chosen sparsity pattern are jointly continuous 
random variables with i.i.d. phases that are uniformly distributed in [0, 2ir), implies, as a consequence 
of fTTj Thm. 13] and [ 1 1 Thm. 14] (see also Appendix [H]l, that the unique solution of both (P0) and BP 



applied to y = Dx is given by x with probability at least (1 - N~ s )(l - 2iV~ s ) > (1 - 3N~ S ). ■ 

Theorem [6] generalizes the result in Theorem [5] to the concatenation D = [A B] of the general 
dictionaries A and B. Next, we determine conditions on D = [A B] for breaking the square-root 
bottleneck. More precisely, we determine conditions on D = [A B] such that for vectors x witrp*] 
n a = 6(M/(logiV)) and n b = 6(M/(logiV)) the unique solution of both (P0) and BP applied to 
y = Dx is given by x with probability at least 1 — 3N~ S . This implies a robust sparsity threshold 
S = n a + rif, °f 9(M/ (log N)). Note that we say the square-root bottleneck is broken only if both n a 
and rib are on the order of M / (log N) . 

Conditions (_18i-(21 1 in Theorem [6] yield upper bounds on the possible values of n a and n& (such 



that the unique solution of both (P0) and BP is given by x) that depend on the dictionary parameters d, 
a, b, N a , Nb, and the spectral norms of A and B. In the following, we rewrite these upper bounds by 
absorbing all constants (including 7 and s defined in Theorem [6]) that are independent of d, a, b, N a , 
Nb, and the spectral norms of A and B in a constant c. Note that c can take on a different value at each 
appearance. We then derive necessary and sufficient conditions on the dictionary parameters d, a, b, N a , 
Nb, and the spectral norms of A and B for the resulting upper bounds on n a and n& to be on the order 
of Mf (log N), respectively. 

We start with condition ( p~8| ), which, together with the obvious condition n a < N a , yields the following 
constraint on n a : 

na " Cmin {loi7V' a ~ 1 ' Na }- 

'"Whenever for some function g(M, N) we write Q(g(M, N)), Q(g(M, N)), ovO{g(M, N)), we mean that the ratio M/N 
remains fixed while M — ¥ 00. 
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As M grows large, this upper bound is compatible^"*] with the scaling behavior n a = 0(M/ (log N)) if 
and only if all of the following conditions are met: 

i) the coherence of D satisfies d = 0(l/y/~M) 

ii) the coherence of A satisfies a = C((log N)/M) 

iii) the cardinality of A satisfies iV a = 0(M/(logiV)). 
Similarly, we get from ( fl9| ) thaf^j 



b~ 2 N b 

logiV' IIBII 2 ' ||A|| 2 ||B| 



n b < cmm{ — — , — 2 , 2 2 } . (22) 



This upper bound is compatible with the scaling behavior n b = Q(M/ (log N)) if and only if all of the 
following conditions are met: 

iv) the coherence of B satisfies b = 0(1/ yM) 

v) the spectral norm of B satisfies ||B|| 2 < cN b (logN)/M 

vi) the spectral norm of A satisfies ||A|| 2 < cN b (logN)/(\\B\\ 2 M). 

Note that iv) is implied by i) since b < d, by assumption. Finally, it follows from i) that conditions ( |2"0| ) 
and ( pT) are compatible with the scaling behavior n a = Q(M / (log N)) and n b = Q(M/ (log N)). 

In the special case of A and B being ONBs for C M , conditions ii) - vi) are trivially satisfied. Hence, 
in the two-ONB case the square-root bottleneck is broken by randomizing according to the specifications 
in Theorem [5] whenever d = 0(1/ \/M), as already shown in 1 1 1 1. The additional requirements ii) - vi) 
become relevant for general dictionaries D only. 

We next present an example of a non-trivial dictionary D with sub-dictionaries A and B (not both 
ONBs) that satisfy i) - vi). Let M = p k , with p prime and k € N + . For this choice of M it is possible to 
design M + 1 ONBs for C , which, upon concatenation, form a dictionary D with coherence d equal to 



1/v M 1 23 1, [24 1, |8|. In particular, the absolute value of the inner product between two distinct columns 
of D is, by construction, either or 1/ vM. Obviously, for such a dictionary i) is satisfied. Furthermore, 
identifying A with one of the M + 1 ONBs and B with the concatenation of the remaining M ONBs, 
we have a = and N a = M. Hence ii) and iii) are satisfied. Since B consists of the concatenation of the 
remaining M ONBs, it has coherence 6 = 1 /yM, and, hence, iv) is satisfied. Moreover, since B is the 
concatenation of M ONBs for C , it forms a tight frame for C M . For a tight frame B with N b = M 2 
frame elements in C (all ^-normalized to one) the nonzero eigenvalues of the Gram matrix B^B are 
all equal to N b /M = M. Hence, the spectral norm of B satisfies ||B|| 2 = M. Thus, v) is met. Finally, 
since A is an ONB, its spectral norm satisfies || A|| =1 and, therefore, condition vi) is met. Now, as a 

1 1 We say that an upper bound on n a , nt, is compatible with the scaling behavior G(A//(log iV)), if it does not preclude this 
scaling behavior. 

l2 Note that the obvious condition n;, < Nt is implied by nt < 7Vj,/||B|| 2 since ||B|| > 1. 
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consequence of Theorem |6j we obtain a robust sparsity threshold S = n a + rib of order Mj (log N) for 
the dictionary D = [ A B] . This threshold does not require that the positions of all nonzero entries of 
x are chosen randomly. Specifically, it suffices to randomize over the positions of the nonzero entries 
of x corresponding to B, while the positions of the nonzero entries corresponding to A can be chosen 
arbitrarily. As for the two-ONB case, once the support set of x is chosen, the values of all nonzero entries 
of x need to be chosen at random. 

Finally, as every dictionary D € V gen (d) can be viewed as the concatenation of two general dictionar- 
ies A and B such that D = [A B], we can now ask the following question: Given a general dictionary 
D, over which part of the dictionary do we need to randomize to "weed out" the sparsity patterns that 
prohibit breaking the square -root bottleneck? From the results above we obtain the intuitive answer that 
in the "low-coherence" part of the dictionary, namely A, we can pick the sparsity pattern arbitrarily, 
whereas the "high-coherence" part of the dictionary, namely B, requires randomization. Note that, due 
to the bounds on the coherence parameters a and b in ii) and iv), respectively, the "low-coherence" 
part A of the overall dictionary D has, in general, fewer elements than the "high-coherence" part B. 
Conditions i) - vi) can be used to identify the largest possible part A of the overall dictionary D where 
the corresponding sparsity pattern can be picked arbitrarily. Note, however, that the task of identifying 
the largest possible part A is in general difficult. 

IV. Conclusion 

We presented a generalization of the uncertainty relation for the representation of a signal in two 
different ONBs (2j to the representation of a signal in two general (possibly redundant or incomplete) 
signal sets. This novel uncertainty relation is important in the context of the analysis of signals containing 
two distinct features each of which can be described sparsely only in an overcomplete signal set. As 
shown in [25 1, the general uncertainty relation reported in this paper also forms the basis for establishing 
recovery guarantees for signals that are sparse in a (possibly overcomplete) dictionary and corrupted by 
noise that is also sparse in a (possibly overcomplete) dictionary. 

We furthermore presented a novel deterministic sparsity threshold guaranteeing uniqueness of the 
(PO)-solution for general dictionaries D G T>(d, a, b), as well as thresholds guaranteeing equivalence of 
this unique (PO)-solution to the solution obtained through BP and OMP. These thresholds improve on 
those previously known by up to a factor of two. Moreover, the known sparsity thresholds for general 
dictionaries and those for the concatenation of two ONBs follow from our results as special cases. 

Finally, the probabilistic recovery guarantees presented in this paper allow us to understand which 
parts of a general dictionary one needs to randomize over to "weed out" the sparsity patterns that prohibit 
breaking the square-root bottleneck. 
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Appendix A 
Proof of LemmaQ] 

Assume that s 6 C AI can be represented as a linear combination of n a columns of A and, equivalently, 
as a linear combination of rif, columns of B. This means that there exists a vector p with n a nonzero 
entries and a vector q with rib nonzero entries such that 



s = Ap = Bq. 



(23) 



We exclude the trivial case n a = = and note that for n a = or = we have s = Om, by 
definition. 

Left-multiplication in ((23) by A^ yields 



A^Ap = A^Bq 



i-i-i 



(24) 



We next lower-bound the absolute value of the ith entry (i = 1, . . . , A^ a ) of the vector A^Ap according 
to 



[A*Ap].|= [pk + E^A]..^ 

> itpiii-oj^iipiii 

= (1 + ct)|[p]i| -oIIpHj 



(25) 
(26) 



where ( 25 1 follows from the reverse triangle inequality and the fact that the off-diagonal entries of A^ A 
can be upper-bounded in absolute value by a. Next, we upper-bound the absolute value of the ith entry 



of the vector A^Bq as follows 



Combining ([26]) and p7| ) yields 



[A H Bq].| <£%!!!. 



(l + a)|[p]i| -allpHi < dllqllj 



(27) 



If we now sum over all % for which [p] j ^ 0, we obtain 



[(1 + 

a) — n a] [|p[|j < riodllqllj 



(28) 



where we used that ||p|| = n a , by assumption. Since nodHqHj > 0, we can replace the LHS of p8] ) by 
the tighter bound 

[(1 + a) - n a a} + Hp^ < ^Hq^ . (29) 
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Multiplying both sides of p3] l by and following steps similar to the ones used to arrive at $29\ yields 



[(l + 6)-n 6 6] + ||q|| 1 < TiftdHpIl! . 



(30) 



We now have to distinguish three cases. If both n a > 1 and n b > 1, and, hence, 1 1 p» 1 1 1 > and 1 1 q 1 1 1 > 0, 



we can combine ( 29 ) and ( 30 ) to obtain 



n a n b d 2 > [(1 + a) - n a a} + [(1 + b) - n b b] + . 



If n a = and n b > 1 (i.e., Upl^ = and ||q|| x > 0), we get from ( |30] > that 

n b > 1 + \. 



(31) 



(32) 



Similarly, if n b = and n a > 1 (i.e., 1 1 q| | x = and \\p\\i > 0), we obtain from ( |29| ) that 



n a > 1 + -. 

a 



Both (32 1 and (33 1 are contained in (31 1 as special cases, as is easily verified. 



(33) 



Appendix B 
Proof of Theorem[2] 

The proof will be effected by deriving a lower bound on the spark of dictionaries in V(d, a, b), which 
together with ([TJ, yields the desired result ( flO] ). This will be accomplished by finding a lower bound on 
the minimum number of nonzero entries that a nonzero vector v G in the kernel of D = [A B] 
must have. Without loss of generality, we may view v as the concatenation of two vectors p € C^- and 
q G C Nb , i.e., v = [p T q T ] T . As v is in the kernel of D = [A B], we have 



[A Bl 



P 

q 



o 



AT- 



Therefore, the vectors p and q satisfy Ap = B(— q) = s. Let n a = ||p|| and n b = ||— q|| = ||q|| 
and recall that n a = is equivalent to p = 0]y a and n b = is equivalent to q = 0^ , both by definition. 
Since we require v to be a nonzero vector, the case of n a = n b = (and hence p = 0^ a and q = 0^ . 
and, therefore v = Ojv) is excluded. For all other cases, the uncertainty relation in Lemma [T] requires 
that the number of nonzero entries in p and — q (representing s according to Ap = B(— q) = s) satisfy 

[l-a(n a -l)] + [l-6(n 6 -l)]+ 



n a n b > 



d 2 



(34) 



Based on (34 1, we now derive a lower bound on spark(D) by considering the following three different 
cases: 
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The case n b > 1 and n a = 0: In this case, the vector v = [p T q T ] T in the kernel of D = [A B 



has nonzero entries only in the part q corresponding to sub-dictionary B. It follows directly from ( [34] ) 
that 

n b >l + \. (35) 
6 

The case n a > land rib = 0: In this case, the vector v = [p T q T ] T in the kernel of D = [A B] has 



nonzero entries only in the part p corresponding to sub-dictionary A. Again, direct application of ( |34| ) 
yields 

n a >l + -. (36) 
a 

The case n a > \andn b > 1: In this case, the vector v = [p T q T ] T in the kernel of D = [A B] has 
nonzero entries in both parts p and q corresponding to sub-dictionary A and B, respectively. Let Z(D) 
denote the smallest possible number of nonzero entries of v in this case. Together with ( [35] ) and ( [36] ) 
we now have 

spark(D) > min il + — , 1 + -, Z(D N 
[ b a 

= min|l + i Z(D)J (37) 

where we used that a < b, by assumption. We next derive a lower bound on Z(D) that is explicit 
in d, a, and b. Specifically, we minimize n a + over all pairs (n a ,nf,) (with n a > 1 and > 1) 
that satisfy ( [34] )- Since, eventually, we are interested in finding a lower bound on spark(D), it follows 
from ([37]) that it suffices to restrict the minimization to those pairs (n , rib), for which both n a < 1 + 1/6 
and n;, < 1 + 1/6. This implies that [1 — a(n a — 1)] > and [1 — 6(715 — 1)] > 0, and we thus have 



from (34 1 that 



[l-q(n a -l)][l-6(n fe -l)] 
n a n b > -p . (38) 



Solving (38 ) for n a , we get 



. (l + a)(l + 6) -n b b{l + a) A 
Ua ~ n b {d?-ab)+a(l + b) ~ J ^ b) - 

Finally, adding n b on both sides yields 

n a + n b > f(n b ) + n b . (39) 

To arrive at a lower bound on n a + n b that is explicit in d, a, and 6 only (in particular, the lower bound 
should be independent of n a and n b ), we further lower-bound the RHS of ( [39] ) by minimizing f{n b ) + n b 
as a function of n b , under the constraints n a > 1 and n b > 1 (implied by assumption). This yields the 
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following lower bound on Z(D)f^| 



We now have that 



Z(D) > min[max{/(n&), 1} + n&] = Z(d,a,b). 

n b >l 



Z(d,a,b) = min[max{/(rab), 1} + rib] 

n b >l 



< [max{/(n 6 ),l}+n 6 ]| n6=1/6 (40) 
1 

= 1 + s 



where we used the fact that f(l/b) < 1. As a consequence of (37 1, the inequality in (|40|) implies that 



spark(D) > Z(d, a, b) 



= min [max{ /(rife), 1} + rib] 

n b >l 

> min [max{/(x), 1} + x] (41) 

X>1 



where (41 1 follows because minimizing over all x € M. with x > 1 yields a lower bound on the minimum 



taken over the integer parameter rib only. We next compute the minimum in (41 1. The function f(x) can 
be shown to be strictly decreasing. Furthermore, the equation f(x) = 1 has the unique solution Xb = 
(1 + &)/(& + d 2 ) > 1, where the inequality follows because d < 1, by definition. We can therefore 
rewrite ( |4T] ) as 

min[max{/(x), 1} + x] = min [f(x)+x]. (42) 

X>1 l<x<x b 

In the case a = b = d, the function g(x) = f(x) + x reduces to the constant 1 + 1/d so that spark(D) > 
1 + 1/d. In all other cases, the function g(x) is strictly convex for x > 0. Furthermore, we have g(l) > 
g(xb) as a consequence of the assumption a < b. Hence, the minimum in ( |42] > is attained either at the 
boundary point or at the stationary point x s of g(x), which is given by 

_ dy/(l+a)(l + b) - o(l + b) 



d 2 -ab 



> 1. (43) 



The inequality in ( |43| > follows from the convexity of g(x) and the fact that g(l) > g(xb)- If the stationary 



point x s is inside the interval [1, Xb], the minimum is attained at x = x s , otherwise it is attained at x = xj,. 

Appendix C 

The Sparsity Threshold in Theorem [2] Improves on the Threshold in Q 

We show that the threshold in ( fTO] ) improves on that in (|7]), unless b = d or d = 1, in which case the 
threshold in ( fTO] ) is the same as that in (jTJ). This will be accomplished by considering the two (mutually 



3 The constraints n a > f(rib) and n a > 1 are combined into n a > max{/(in), 1}. 
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exclusive) cases x b < x s and Xb > x s . 

The case x b < x s : The threshold in (fTOb equals 



f(x) + x _ f(x b ) + x b _ 1 f i + 1 + 6 



If l+b\ If 1 

1+ > X 1 + 



2 2 2 V 6 + d 2 

It is now easily verified that 

2 V ' b + d? ) '- 2 V/ ' d, 
for all b < d < 1 with equality if and only if b = d or d = 1. Note that for 6 = d (irrespective of a) or 
for d = 1 (irrespective of a and 6), we have x b < £ s . 

77ie ca^e > x s : Set A = + a)(l + 6) — d. The function /(x s ) + x s , which we denote as 
h(a, b, d) to highlight its dependency on the variables a, b, and d, is strictly decreasing in a (for fixed 
b and d) as long as b/d < A < d/b. Since > x s implies b < d, and since a < 6, by assumption, 
the inequality A < d/b is always satisfied. The inequality bjd < A holds whenever x s < x b , which is 
satisfied by assumption. Hence, we have that 

f(x s ) + x s = h(a,b, d) > h(b,b,d) 

_ 1 + b 
d + b 

>ifi + iY (45) 



2 V d 



Note that equality in (44) and (45 1 holds if and only if a = b = d, already treated in the case x b < x s . 



Appendix D 
Proof of Theorem[3] 

Our proof essentially follows the program laid out in |26} for dictionaries in V on \,(d), with appropriate 
modifications to account for the fact that we are dealing with the concatenation of two general dictio- 
naries. Let S be the matrix that contains the columns of A and B participating in the representation of 
y = [A B] x, i.e., the columns in [A B] corresponding to the nonzero entries in x. A sufficient condition 
for BP and OMP applied to y = [A B] x to recover x is |9, Thm. 3.1, Thm. 3.3] 



max 



< 1 (46) 

l 



where the maximization in ( |46| ) is performed over all columns dj in D that do not appear in S. We 
prove the theorem by first carefully bounding the absolute value of each element of the vector S^dj. 
Concretely, we start with the following inequality 
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i 



and then bound the absolute value of each entry of the matrix (S^S) -1 and of each element of the vector 
S^dj. We will verify below that the matrix S H S is invertible. To simplify notation, for any matrix A, 
we let I A I be the matrix with entries 



Furthermore, if for two matrices A and B of the same size we have that 



Akil < |[B" 



k.i\ 



for all pairs (k, I), we shall write | A| < |B|. 

A. Bound on the Elements of (S^S) -1 

Since the columns of D are ^-normalized to 1, we can write 

where — K contains the off-diagonal elements of S^S. Clearly, 



e 

IKI < 



,n a J-ria 



< 



dln b ,n a b(ln bi n b l nb/ 

bln a +n b + d^-n a +n b ,n a +n b — {d — 6)T 



b{^-n a ,n a 
din 



d1n a ,n b 



b(ln b ,n b Ir, 



(47) 



where we set 



As a consequence of (47 1 and using the assumption 715 > n a , we have that 1 1 K 1 1 x 1 < dn^ + b(n a — 1). 



Since || • || x : is a matrix norm 1 27 p. 294], the matrix S H S is invertible whenever drib+b(n a —l) < l,and, 
moreover, we can expand (S^S) -1 into a Neumann series according to (S^S) -1 = In a +n 6 +X]fcli 
As the condition in ( fT2] ) implies that dn^ + b(n a — 1) < 1, we have 



>-n a +n b 



k=l 

00 

< I na+nb + ^|K|' 



(48) 
(49) 



k=i 
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e . . 

< In a +n 6 + ^^[ — bl na+rlb + dl„ a + ni)ina + ra6 — (d — 6)T] 

k=l 

= [(l + b)I no+nt + (d-b)T -dl T 



i -l 



l n a +n b ,n a +n b ] 



[In. 



dX 1 l rta + rlbj?la 4. ri J X 1 . 



(50) 



Here, in ( 49 1 we used the triangle inequality and the fact that | K fe | < | K | . We next compute the inverses 
in ( [50] ). To get X -1 , we use the fact that X is a block-diagonal matrix and apply Woodbury's identity 1 27 



p. 19] to each of the two blocksp] which yields 



X 



d-b 



1 + 6 V (d-b)n a + l + b~ 







nb,n a 



I x 



(d - 6)n 6 + 1 + 6 

Next, setting c a = [(d - 6)n a + 1 + 6] _1 , c& = [(d - b)n b + 1 + 6] _1 , and 



(51) 



v = d 



Cbln b 



steps similar to the ones reported in [26, Eq. (A.2)-(A.3)] yield 



[In. 



-n 6 dX l rla 4. ri , 6jna 4_ nb l 



l n a +ni, 



+ 



1 



1 - d(c a n a + c b n b 



vl 



n a +n t - 



(52) 



Using the fact, shown in (50 1, that 



(S^S) 1 | < [l na + nb — dX 1 l ria + TlbtT! , ci + rib ] X 1 



we can combine ( pTj ) and ([52]) to obtain an upper bound on the absolute value of each entry of (S S) . 



B. Bound on the Elements of S H d{ 

Let di be a column of D that does not appear in S. Assume that dj G A (we will later show that in 



searching the maximum in ( |46[ ) it is, indeed, sufficient to assume dj £ A). Then, we have 

S"d; < 



e 


al n a 


e 


bln a 


< 




< 






dl n „ 







(53) 



As a sideremark, we note that we loose the dependency of our final result on a through the bounds ( 47 ) 



and (53 l 



14 To apply Woodbury's identity, we exploit the fact that l n ,n = lnlr, 
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C. Putting the Pieces Together 



Substituting (52 1 into (50 1, we get 



Std, 



1 



Vl 



'-n a +n b 



+ 



1 



1 - d(c a n a + c b n b ) ' ~ n " +rib 

1 rp 

1 - d(c a n a + c b n b ) V n » +n " 



X 



-i 



61 
dl 



hi, 



bc a l na 

dc b l nb 



1 - d(c a n a + c b n b ) 



(bc a + (d - b)dn b c a c b )l na 
{dc b - (d - b)dn a c a c b )l nb 



(54) 



Summing the RHS of ( |54| ) over all entries of the vector S^dj yields the following upper bound on || S ' di || 



S+cL 



< 



bc a n a + dc b n b 



l 1 - d(c a n a + c b n b ) ' 
If we instead assume that d» G B and apply the same steps as before, we find that 



(55) 



SM, 



< 



dc a n a + bc b n b 



l 1 - d(c a n a + c b n b ) ' 

Since bc a n a + dc b n b > dc a n a + bc b n b it follows that 

dc a n a + bc b n b bc a n a + dc b n b 

1 - (i(c a n a + c b n b ) ~ 1 - d(c a n a + c fe n b ) 

and hence 



(56) 



max 



Stdi 



< 



&c a n a + dc b n b 



l 1 - d(c a n a + c b n b ) 

We can therefore conclude that a sufficient condition for BP and OMP applied to y = Dx to recover x is 

bc a n a + dc b n b 



1 - d(c a n a + c b n b ) 



< 1. 



(57) 



Simple algebraic manipulations reveal that A57) is equivalent to ( 12 1 



Appendix E 
Proof of Corollary[4] 

We obtain Corollary |4] as a consequence of Theorem [3] as follows. For given n b > n a it follows 



from ( 12 1 that a sufficient condition for BP and OMP to recover the unknown vector x is 



n a < 



(1 + b) 2 -m(l + b)(d + b) 



h{n b ). 



26(1 + 6) + 2n b {d 2 - 6 2 ) 

To arrive at a sparsity threshold that is explicit in 6 and d only, we minimize h(n b ) + n b over n b , under 
the constraint n b > 1 (recall that n b > n a and note that representing a nonzero vector y 6 C M requires 
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at least one column of D). Furthermore, we have that 

mm[h(nb) + nj,] > mm[h(x) + x] = S (58) 

Tl b >l X>1 

where x G R. Clearly, minimizing over all x > 1 with x £ R, as opposed to integer values only, 
can only yield a smaller value for the minimum. In the case b = d, the function h(x) + x reduces to 
the constant (1 + l/d)/2, thereby recovering the previously known sparsity threshold in ([7]). In all other 
cases, the function h(x) + x is strictly convex for x > 0. Hence, the minimum in ([58]) is attained either 
at the boundary point x = 1 or at the stationary point x s of h{x) + x, given by 

_ (l + b)(^2d(b + d)-2b) 
Xs ~ 2(d 2 - b 2 ) 



If the stationary point satisfies x s > 1, then the minimum in ( |58j ) is attained at the stationary point, 
otherwise the minimum is attained at the boundary point x = 1. The condition x s > 1 is equivalent to 
the condition n{d, b) > 1 (where n(d, b) is defined in ([14])). If K(d, 6) > 1 the minimum in ( [58] ) is given 
by 

(1 + b) [2V2ygg + g - (d + 36)] 
~ 2(d 2 - b 2 ) ' 



If n(d, b) < 1, the minimum in (58 1 is attained at the boundary point x = 1 and is given by 



_ 1 + 2 d 2 + 3b-d(l + b) 

b ~ 2jd^Tb) • (59) 

Note that for b = d the sparsity threshold in ([59]) reduces to that in ([7]). 

Appendix F 

The Sparsity Threshold in Corollary[4]Improves on the Threshold in ^ 

We show that the threshold in Corollary |4] improves on that in Q, unless b = d or d = 1, in which 
case the threshold in Corollary [4] is the same as that in ([7]). Let us first consider the case when the RHS 
of ( fl"3] ) in Corollary [4] reduces to 

A 1 + 2d 2 + 3b - d(l + b) 
2(d 2 + b) 

We need to establish that 



2(d 2 + 6) - 2 V d 



with equality if and only if b = d or d = 1. Straightforward calculations reveal that the inequality ( [60] ) 
is equivalent to 

(d-6)(l-d) 2 > (61) 



which is satisfied for all b < d. Furthermore, equality in (61 1 holds if and only if b = d or d = 1. 
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Next, we consider the case b < d and n{d, b) > 1 so that the RHS of ( fj"3| ) reduces to 



For d < 7/9 it can be verified that 



;i + 6) [2V2y/d(b + d)-(d + 3b)] 
2(d 2 - b 2 ) 



+ [2y/2y/d(b + d)-(d + 3b)] If 1\ 
2(d 2 -b 2 ) > 2 V d J ' 



It turns out that a necessary condition for n(d, b) > 1 is d < l/v2. The proof is completed by noting 
that l/y/2 < 7/9. 



Appendix G 
Proof of Lemma[7] 

Since the minimum singular value cr m i n (S) of the sub-dictionary S can be lower-bounded as c^ in (S) > 
1 — ||S^S — In a +n b L we have 

1 



0"min(S) < 



1 

71 



<in(S) < 



<P 1- IIS^S-L 



< 



1 



|s S I nti + n() || > 



(62) 



Next, we study the tail behavior of the random variable H = S S — In a +n 6 > which will then allow 
us to upper-bound P{ S S — I na+nb || > 1/2}. To this end the following lemma, which follows from 
Markov's inequality, will be useful. 

Lemma 8 ( / |77| Prop. 10]): If the moments of a nonnegative random variable R c an be upper-bounded 

as [E(R q )] 1 / q < a^q + (3 for all q > Q > 1, where a,/3>0, then, 

P{i? > eV*(au + P)} < e-" 2/4 

for all u > y/Q. 

To be able to apply Lemma[8]to H = ||S^S — I na+nb ||, we first need an upper bound on [E(i7 9 )] 1 / I? 
that is of the form a^/q + f3. To derive this upper bound, we start by writing S as S = [S a Sb], where 
S a and Sb denote the matrices containing the columns chosen arbitrarily from A and randomly from B, 
respectively. We then obtain 



S H S 



l n a +n b 



S^S a Sj^Sb I 



in 
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Applying the triangle inequality for operator norms, we can now upper-bound H according to 

q#q — T 

S^S a — l Tla 



H 



S?S a Sf^S b I r , 



< 







+ 



o s?s h 



sf?s a 



< max{||Sf S a - I n J , ||S^S fe - I n J|} + ||Sf S 6 | 
— II S^S a — I n<l || + II Sj, S& — I.n. b 1 1 + || S^SJ| 



(63) 



where the second inequality follows because the spectral norm of both a block-diagonal matrix and an 
anti-block-diagonal matrix is given by the largest among the spectral norms of the individual nonzero 
blocks. Next, we define H a = ||Sf S a - I n J, H h = ||Sf S 6 - I n „ ||, and Z = ||SfS 6 ||. It then follows 



from ([63]) that for all q > 1 

[E(H q )] 1/q < [E((iJ a + H b + Z) q )] 1/q 

< [E{H«)] 1/q + [E{H q )] 1/q + [E(Z q )] 1/q 

= H a + [nH q )] 1/q + [nz q )\ l,q 



(64) 



where the second inequality is a consequence of the triangle inequality for the norm [Ed-I 9 )] 1 ^ (recall 
that we assumed q > 1 and hence [IE(| ■ | 9 )] 1//<? is a norm), and in the last step we used the fact that H a is 



a deterministic quantity. All expectations in ( |64| ) are with respect to the random choice of columns from 
the sub-dictionary B. 

We next upper-bound the three terms on the RHS of ([64]> individually. Applying Gersgorin's disc 



theorem [27 Thm. 6.1.1] to the first term, we obtain 



fl B = ||SfS a -I n J| < K-l)a. 



(65) 



For the second term, we use ijTTJ Eq. (6.1)] to get 

[E(H q )] 1/q 



E(||SfS 6 -I n J| 9 



1/9 



< \/U4b 2 n bri + -^||B| 



(66) 



where r\ = max{l, log(nfe/2 + l),g/4}. Assuming that q > max{41og(rab/2 + 1),4} and, hence, 
r\ = q/4, we can simplify ([66]) to 



[E(H q )] 1/q <6^n b ^ + 



— IIBII 2 
N b 11 11 



(67) 
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To bound the third term, we use the upper bound in [11, Thm. 8] on the spectral norm of a random 
compression combined with the fact that rank(S^S^) < n&, which is a consequence of S^S b being of 
dimension n a x rtf,. This yields 



EfllSfSj 9 



1/9 



<3v^||SfB|| lj2 + 




nb » S ?B| 



(68) 



where r2 = max{2, 2 log n^, g/2}. Assuming that q > max{4 log n^, 4}, we can further upper-bound 
the RHS of ([68) to get 



[E(Z*)] 1 /«<^ v ^||sfB|| 12 + 



V2 
3 




^ii S H B , 



< 



x/2 




Vd 2 na^/q+ J^IIAII | B| 



(69) 
(70) 



where (69 1 follows from the fact that the magnitude of each entry of S^B is upper-bounded by d and, 



thus, ||SfB|| 12 < \/d 2 n a . To arrive at ([70"1) we used ||Sf B|| < ||Sf || ||B|| < ||A|| ||B||, which follows 
from the sub-multiplicativity of the spectral norm and the fact that the spectral norm of the submatrix 



S a of A cannot exceed that of A |27| Thm. 4.3.3]. We can now combine the upper bounds (65 ), (67 1, 



and ( 70 1 to obtain 



2nb 
N b 1 



[E(H q )] 1/q < (n a - l)a + §\/Wn h Jq + ^||B|| 2 + 
3 



-\- — ^/d 2 n ay fq+ II All ||B| 




+ K-l)a + — ||B|| +J_||A||||B| 



/8 



for all q > Qi = max{41og(nfo/2 + 1), 4 log n&, 4}. Hence, Lemma [8] yields 

F{H > e l/A {au + l3)} < e^ 4 
for all u > \[Q\. In particular, under the assumption N > e « 2.7, it follows that the choice u 



yJAs log iV satisfies u >\/Qi for s > 1. Straightforward calculations reveal that conditions ([T8]) and ([19]) 
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ensure that e 1 / A (au + j3) < 1/2, which together with (62) leads to 



c {<r min (S) < 1/^2} < F{H > 1/2} 

< P{H > e 1/4 (au + /?)} 

< e -« 2 /4 = N s_ 



Appendix H 
Prior Art 



A. Tropp 's (MO) Model and (PO)-uniqueness 
In pl| the following model was introduced. 



Model (MO) for a signal y = Dx 



The dictionary 


D 


has coherence d. 


The vector 


X 


has nonzero entries only in the positions corresponding to the columns 






of a sub-dictionary S of D; furthermore, the entries of x restricted to the 






chosen sparsity pattern are jointly continuous random variables. 


The sub-dictionary 


S 


satisfies <7 m i n (S) > l/\/2 and has T < d~ 2 /2 columns. 



The following theorem builds on (MO). 

Theorem 9 Thm. 13]): Suppose that y = Dx is a signal drawn from Model (MO). Then x is 
almost surely the unique vector that satisfies the constraints 

Dx = y and ||x|| < T. 
B. Tropp 's (Ml ) Model and Recovery via BP 



In [ 1 1 1 the following model was introduced. 



Model (Ml) for a signal y = Dx 



The dictionary 


D 


has coherence d. 


The vector 


X 


has nonzero entries only in the positions corresponding to the columns 






of a sub-dictionary S of D; furthermore, the phases of its nonzero entries 






are i.i.d. and uniformly distributed on [0, 2ir) (the magnitudes need not 






be i.i.d.). 


The sub-dictionary 


S 


satisfies cr min (S) > l/y/2 and has T < d~ 2 /[8(s + 1) log AT] columns 






> 1)- 



The following theorem builds on (Ml). 
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Theorem 10 ( pl\ Thm. 14]): Suppose that y = Dx is a signal drawn from Model (Ml). Then x is 
the unique solution of (BP) with probability at least 1 — 2N~ S . 

If the requirements of both (MO) and (Ml) are satisfied, then combining Theorems 9 and 10 yields 
the following statement: The unique solution of both (P0) and BP applied to y = Dx is given by x with 
probability at least 1 — 2N~ S . Note, however, that both (MO) and (Ml) require the sub-dictionary S to 
have cr m ; n (S) > l/\/2. Lemma [7] shows that for D = [A B] and S consisting of n a arbitrarily chosen 
columns of A and randomly chosen columns of B the sub-dictionary S has o" mnl (S) > l/v2 with 
probability at least 1 — N~ s . 
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